# Embedded DRAM (eDRAM) Power-Energy Estimation Using Signal Swing-Based Analytical Model

Yong-Ha PARK<sup> $\dagger a$ </sup>, Jeonghoon KOOK<sup> $\dagger$ </sup>, and Hoi-Jun YOO<sup> $\dagger$ </sup>, Nonmembers

**SUMMARY** Embedded-DRAM (eDRAM) power-energy estimation model is proposed for system-on-a-chip (SOC) applications. The main feature is the signal swing based analytic (SSBA) model, which improves the accuracy of the conventional SRAM power-energy models. The power-energy estimation using SSBA model shows 95% accuracy compared with the transistor level power simulation for three fabricated eDRAMs. The SSBA model combined with the high-level simulator provides fast and accurate system level power-energy estimation of eDRAM. *key words:* eDRAM, power estimation

## 1. Introduction

Various embedded DRAM macros have been presented for the system-on-a-chip (SoC) applications as deep sub-micron technology is matured [1]-[6]. Their random access cycle is reduced to the value compatible with that of the low power SRAM [1]-[4]. Their wide I/O interface enables high bandwidth over several GB/s on a single chip [5], [6]. However, the researches for memory power-energy estimation have been restricted to cache memory or SRAM, which have significant impacts on the power consumption of overall memory system [7], [8]. In off-chip design, the power-energy consumption of DRAM could be ignored because off-chip I/O interconnection consumes more one or two orders of magnitude power-energy than standalone DRAM itself [8]. In SoC design, the situation is changed. Power consumption by on-chip interconnection is reduced to the same order of that of eDRAM because of the tremendous reduction on on-chip I/O capacitance. In order to obtain the power-energy efficient eDRAM architecture in early design phase, a fast-accurate eDRAM power analysis is essential especially for the battery driven applications.

However, the previous SRAM models suffer from poor accuracy when they are applied to eDRAMs because of the following critical drawbacks. An analytic power-energy SRAM model suffers from the estimation error because of inaccurate model of analog signals [7]. The mixed approach combining the analytical model for digital signal and the simulation model for analog signal reduces percentage error compared with the complete

analytical model [8]. But detail circuit simulation and handcraft physical layout for more precise result causes a long verification time and it also has percentage error ranged from 1% to 32% according to what kinds of analog scheme are applied for high-speed or energy efficient operation [8]. In DRAM operation, analog signal are more widely used than in SRAM operation in order to achieve high-speed, low power and noise margin; for example, various pre-charge voltage levels, small swing operation, differential signaling and equalization scheme. This means that the trade-off between estimation time and accuracy causes more serious problem for early power-energy estimation than that of SRAM. In this letter, the simple-accurate eDRAM power-energy model, based on the signal swing characteristics, is presented and verified with the transistor level power simulation results of three fabricated eDRAMs, which have their own special signal characteristics.

## 2. Signal Swing Based Analytical (SSBA) eDRAM Power-Energy Model

## 2.1 Switching Current Ratio

Two current flows of load capacitance current  $(I_C)$  and switching current  $(I_S)$  are engaged in MOS power dissipation when the voltage level of the load capacitance is charged or discharged as shown in Fig. 1.  $I_C$  is controlled by capacitor terminal voltage depending on the time constant. Basic energy calculation shows that the power supply of  $V_{CC}$  provides the energy amount of  $CV^2$  when  $I_C$  charges the load capacitance as shown in Fig. 1(a) and 0 when  $I_C$  discharges it as shown in Fig. 1(b), respectively.

Energy consumption by  $I_S$  is transformed into additional energy consumption by the equivalent load capacitance of  $\alpha C_L$  as shown in Figs. 1(c) and (d). This is because the switching current controlled by the channel width of driving transistors is proportional to the load capacitance when the slew rate at input gate terminal of driving transistors remains to the constant value. The coefficient of  $\alpha$  is obtained from the slope of energy difference between  $E_{IC+IS}$  and  $E_{IC}$  for selected capacitance values (or  $\Delta (E_{IC+IS} - E_{IC})/\Delta C_L$ ). Energy consumption of  $E_{IC+IS}$ , which is supplied from  $V_{CC}$ through both  $I_C$  and  $I_S$ , is obtained from SPICE simulation. Energy consumption of  $E_{IC}$ , which is supplied

Manuscript received September 19, 2001.

Manuscript revised December 5, 2001.

<sup>&</sup>lt;sup>†</sup>The authors are with the Department of EE. Korea Advanced Institute of Science and Technology (KAIST), 373-1, Kusung-dong Yusong-gu, Taejon, 305-701, Korea.

a) E-mail: yhpark@ssl.kaist.ac.kr



**Fig. 1** (a) and (b) energy consumption by load current  $(I_C)$  and switching  $(I_S)$  for charging and discharging, respectively, (c) and (d) simplified switching current modeling for  $I_C$  and  $I_S$ , respectively.

from  $V_{CC}$  through  $I_C$  only, is obtained from analytical calculation. This transform simplifies the switching current modeling procedure for various buffering strategies.

## 2.2 Signal Swing Ratio

When the voltage level is switched between  $a \cdot V_{CC}$  and GND or between  $(1-a) \cdot V_{CC}$  and  $V_{CC}$  shown Figs. 2(a) and (b), the total energy amount provided from the power supply is  $a^2 \cdot CV_{CC}^2$  or  $a \cdot CV_{CC}^2$ , respectively. Where 'a' is smaller than 1 in the case of small swing and equal to 1 in the case of full  $V_{CC}$  swing.

When the signal is discharged from the initial voltage level as shown in Figs. 2(d) and (e), the energy amount provided from the power supply is zero. This is because the energy stored in the capacitance is dissipated either in the distributed parasitic resistance R or in other power supply. The energy amount of  $a^2 \cdot CV_{CC}^2$ and  $a \cdot CV_{CC}^2$  is provided from the power supply only when the signals are charged from the initial voltage level as shown in Figs. 2(c) and (f), respectively. Some of the energy provided from the power supply is dissipated in the distributed parasitic R. The other of them is stored in the load capacitance, C, and dissipated through R if the load capacitance is discharged. However, in the digital signal with full  $V_{CC}$  swing, the energy amount provided from the power supply is always  $CV_{CC}^2$  since a = 1 as it is known in the conventional models in [7]–[9].

In DRAM design, differential signals with the same pre-charged voltage level are frequently used as shown in Fig. 3. Their energy amount provided from the power supply is summarized according to the pre-charged volt-



Fig. 2 Arbitrary voltage swings and their equivalent circuits for the power-energy calculation.



Fig. 3 Energy calculation for (a) half  $V_{CC}$  pre-charged full swing signal and (b) full  $V_{CC}$  pre-charged arbitrary swing.

age level as shown in Figs. 3(a) and (b), assumed that each signal drives the same load capacitance C. When the differential signals pre-charged with the voltage level of half  $V_{CC}$  are evaluated to GND or  $V_{CC}$  and they return to the initial voltage of half  $V_{CC}$  like Fig. 3(a), the energy amount provided from the power supply is  $0.75 CV_{CC}^2$ . If the charge-recycle is utilized by the equalization technique, this value is reduced to  $0.5 CV_{CC}^2$ , because two same capacitances storing the opposite voltage level of  $V_{CC}$  and GND are naturally equalized to half  $V_{CC}$  without any energy supply from the power supply. When the only one of the differential signals, pre-charged with the voltage level of  $V_{CC}$ , is discharged to arbitrary voltage level and returns to the initial voltage level of  $V_{CC}$  (the remind signal is maintained to the voltage level of  $V_{CC}$ ) like Fig. 3(b), the energy amount from power supply is always  $a \cdot CV_{CC}^2$ , regardless of  $0 < a \leq 1$ , even with equalization. This is because the charge-recycle can reduce the signal swing to half but the load capacitance to be driven to the  $V_{CC}$  level increases to a double.



Fig. 4 eDRAM architecture for power-energy model.

#### 2.3 Signal Swing Based Analytical Model

Overall eDRAM power-energy consumption is dominantly determined by power-energy consumption of the large capacitive load similar to the previous SRAM power-energy models [7]–[9]. This is because about 85% of DRAM power-energy is dissipated by the capacitance loads during the operation of word line (WL), high voltage supply line (RX), bit line (BL), sensingrestoring line (SRTO), data bus line (DB) and I/O line (IO) as shown in Fig. 4 [10]. A timing control signal or combinational logic consumes only less than 15%of total energy consumption, which can be easily estimated by estimation methods in [9], [10]. In this letter, the general eDRAM architecture as shown in Fig. 4 is considered for the SSBA power-energy model. Each bank is assumed to contain M-BL pairs, M-SAs, Ksubword drivers, L-DB lines pairs, and L-IOs interface with other embedded components. Equations (1a)-(1d)summarize the SSBA eDRAM power-energy models of WL, BL, DB, and IO line. Each component of  $C_{WL}$ ,  $C_{RX}, C_{BL}, C_{SRTO}, C_{DB}$  or  $C_{IO}$  contains its line capacitance and MOS junction/gate capacitance, which is connected with corresponding line. Each capacitance component includes its own switching current ratio of  $\alpha$ used in each MOS driver as described in Sect. 2.1.  $S_{BL}$ ,  $S_{DB-R}, S_{DB-W}$  and  $S_{IO}$  are the signal swing coefficient for each swing characteristics as described in Sect. 2.2. Here,  $\sigma_X$  is the average bit update ratio between the previous data and the currently accessed data for the corresponding signal. Energy consumption by each signal is obtained from the multiple of corresponding C.

 $S, \sigma$ , and supply voltage (V) as shown in Eq. (1).

Power-energy consumption by the BLSA isolation logic  $(C_{ISO})$  is additionally considered if the shared BLSA structure is used as described in Eq. (1b). DB model should be separated for read and write operation as described in Eqs. (1c-1) and (1c-2), respectively. This is because most DRAMs adopt the small DB swing for the fast read operation while they use full swing for the write operation in order to overwrite data latched at BLSA. In DB write model, additional BL swing should be included as described in the last term of Eq. (1c-2),  $\sigma_{BL}LC_{BL}V_{CC}^2$ , because BL pairs selected by column address show full  $V_{CC}$  swing, if the written data is different with the previous data latched at BLSA.

$$E_{WL} = (C_{WL} + C_{RX}) \cdot V_{PP}^2 \tag{1a}$$

$$E_{BL} = S_{BL} M (C_{BL} + C_{SRTO}) V_{CC}^2 + C_{ISO} V_{ISO}^2$$
(1b)

$$E_{DB\_R} = L \cdot S_{DB\_R} C_{DB} V_{CC}^2 \tag{1c-1}$$

$$E_{DB\_W} = L \cdot S_{DB\_W} C_{DB} V_{CC}^2 + \sigma_{BL} L \cdot C_{BL} V_{CC}^2$$
(1c-2)

$$E_{IO} = \sigma_{IO} L \cdot S_{IO} C_{IO} V_{CC}^2 \tag{1d}$$

#### 3. Verifications and Discussions

The estimation results obtained from the SSBA powerenergy model is compared with the transistor level simulation results of three fabricated eDRAMs. Each design utilizes different signal swing, circuit and process

|   | Process    | Power (V <sub>CC</sub> /V <sub>PP</sub> ) | BL Swing       | DB Swing           |       | Access |
|---|------------|-------------------------------------------|----------------|--------------------|-------|--------|
|   |            |                                           |                | Pre-charge         | Swing | Туре   |
| A | 0.35μ EML  | 3.3/5.0                                   | Direct + CC SA | 0.5V <sub>cc</sub> | Full  | RMW*   |
| В | 0.18μ EML  | 2.2/4.0                                   | CC SA          | Vcc                | Small | RMW*   |
| С | 0.16μ DRAM | 2.0/3.3                                   | SBW Scheme     | 0.5V <sub>CC</sub> | Small | R/W**  |

Table 1 Design characteristics used in verification.

(\*RMW: Read-Modify-Write, \*\*R/W: Read or Write)



Fig. 5 The comparison of power consumption between SSBA model and transistor level simulation results.

technology as summarized in Table 1 [11]–[13]. Designs A and B, which have the features of read-modifywrite (RMW) scheme, are fabricated by using  $0.35 \,\mu m$  and  $0.18 \,\mu m$  embedded memory logic (EML) technology, respectively. Design C, fabricated by using  $0.16 \,\mu m$ DRAM technology, adopts the single bit-line writing (SBW) scheme for selective bit-line swing [13].

The verification results of Fig. 5 show that eDRAM power-energy estimation using the SSBA model can achieve the estimation accuracy over 95% compared with the transistor level power simulation results using PowerMill. These good matches are achieved from the precise modeling of various analog signal characteristics used in each design as followings. In design A, the DB power consumption of read operation is almost equal to that of the write operation as shown in Fig. 5 because full swing takes places on both read and write operation as summarized in Table 1. However, in design B, DB power consumption for read operation is smaller than that of the write operation, because of small swing DB operation for fast read. This feature is exactly estimated by SSBA model as shown in Fig. 5. Furthermore, design C adopts SBW scheme that allows only one of the BL pair to be activated [13]. The SSBA model exactly estimates the feature of SBW as shown in verification results. The proposed SSBA model achieves good agreements within 5% error for the various analog signals, which are difficult to be estimated by the conventional power-energy models.

For system level power-energy estimation, the proposed SSBA model cooperates with the conventional



**Fig. 6** System level power-energy estimation using the SSBA model (Gray line for conventional system level simulation/dark line for the additional eDRAM power-energy estimation).

high level system simulation environment as shown in Fig. 6. Overall eDRAM power-energy consumption is obtained from the combination of SSBA model and memory access statistics such as the number of row activations  $(N_{ROW})$ , the number of column activations  $(N_{COL_R}$  for read and  $N_{COL_W}$  for write) and bit update ratio ( $\sigma$ ) as shown in Fig. 6. This approach enables high-accurate system level power-energy estimation to be achieved two or three orders of faster than the transistor level power-energy verification.

## 4. Conclusion

The signal swing-based analytical (SSBA) eDRAM power-energy model achieves the estimation accuracy about 95%. This is because the SSBA model exactly estimates their special analog behaviors. The SSBA model combined with the high-level simulator provides a fast and accurate system level power-energy estimation of eDRAM.

#### References

- Y. Agata, K. Motomochi, Y. Fukushima, M. Shirahama, M. Kurumada, N. Kuroda, H. Sadakata, K. Hayashi, T. Yamada, K. Takahashi, and T. Fujita, "An 8-ns random cycle embedded RAM macro with dual-port interleaved DRAM architecture (D<sub>2</sub>DRAM)," IEEE J. Solid-State Circuits, vol.35, no.11, pp.1668–1672, Nov. 2000.
- [2] O. Takahashi, S.H. Dhong, M. Ohkubo, S. Onishi, R.H. Dennard, R. Hannon, S. Crowder, S.S. Iyer, M.R. Wordeman, B. Davari, W.B. Weinberger, and N. Aoki. "1-GHz fully pipelined 3.7 ns address access time  $8 k \times 1024$  embedded synchronous DRAM macro," IEEE J. Solid-State

Circuits, vol.35, no.11, pp.1673-1679, Nov. 2000.

- [3] Y. Yokoyama, N. Itoh, M. Hasegawa, M. Katayama, H. Akasaki, M. Kaneda, T. Ueda, Y. Tanaka, E. Yamasaki, M. Todokoro, K. Toriyama, H. Miki, M. Yagyu, K. Takashima, T. Kobayashi, S. Miyaoka, and N. Tamba, "A 1.8 V embedded 18 Mb DRAM macro with a 9-ns RAS access time and memory cell area efficiency of 33%," IEEE J. Solid-State Circuits, vol.36, no.3, pp.503–509, March 2001.
- [4] T. Okuda, I. Naritake, T. Sugibayashi, Y. Nakajima, and T. Murotani, "A 12-ns 8-Mbyte DRAM secondary cache for a 64-bit microprocessor," IEEE J. Solid-State Circuits, vol.35, no.8, pp.1153–1158, Aug. 2000.
- [5] A. Yamazaki, T. Fujino, K. Inoue, I. Hayashi, H. Noda, N. Watanabe, F. Morishita, J. Ootani, M. Kobayashi, K. Dosaka, Y. Morooka, H. Shimano, S. Soeda, A. Hachisuka, Y. Okumura, K. Arimoto, S. Wake, and H. Ozaki, "A 56.8 GB/s 0.18  $\mu$ m embedded DRAM macro with dual port sense amplifier for 3D graphics controller," ISSCC Digest of Technical Papers, pp.394–395, Feb. 2000.
- [6] K. Hardee, O.F. Jones, M. Parris, D. Butler, L. Aldrich, P. Austin, K. Jacobsen, M. Miyabayashi, K. Taniguchi, and T. Arakawa, "A 1.43 GHz per data I/O 16 Mb DDR low power embedded DRAM macro for a 3D graphics engine," ISSCC Digest of Technical Papers, pp.386–387, Feb. 2001.
- [7] M.B. Kamble and K. Ghose, "Analytical energy dissipation models for low power caches," Proc. 1997 International Symposium on Low Power and Electronics Design, pp.143– 148, 1997.

- [8] R.J. Evans and P.D. Franzon, "Energy consumption modeling and optimization for SRAM's," IEEE J. Solid State Circuits, vol.30, no.5, pp.571–579, May 1995.
- [9] D. Liu and C. Svensson, "Power consumption estimation in CMOS VLSI chips," IEEE J. Solid State Circuits, vol.29, no.6, pp.663-670, June 1994.
- [10] T. Sugibayashi, T. Takeshima, I. Naritake, T. Matano, H. Takada, Y. Aimoto, K. Furuta, M. Fujita, T. Saeki, H. Sugawara, T. Murotani, N. Kasai, K. Shibahara, K. Nakajima, H. Hada, T. Hamada, N. Aizaki, T. Kunio, E. Kakehashi, and Masumori, "A 30 ns 256-Mb DRAM with a multidivided array structure," IEEE J. Solid-State Circuits, vol.28, no.11, pp.1092–1098, Nov. 1993.
- [11] Y.-H. Park, S.-H. Han, J.-H. Lee, and H.-J. Yoo, "A 7.1 GB/s low power rendering engine in 2D array embedded memory logic CMOS for portable multimedia system," IEEE J. Solid State Circuits, vol.36, no.6, pp.944–955, June 2001.
- [12] R. Woo, C.-W. Yoon, J. Kook, S.-J. Lee, K. Lee, Y.-H. Park, and H.-J. Yoo, "A 120 mW embedded 3D graphics rendering engine with 6 Mb logically local frame buffer and 3.2 Gbyte/s run-time reconfigurable bus for PDA-chip," Symposium on VLSI Circuits Digest of Technical Paper, pp.95–98, June 2001.
- [13] J. Kook and H.-J. Yoo, "A single bit line writing scheme for low power reconfigurable I/O DRAM macro," IEEE European Solid-State Circuit Conference of Digest of Technical Paper, pp.420–423, Sept. 2000.